Word juncture modelling based on the TIMIT database

نویسندگان

  • Xue Wang
  • Louis C. W. Pols
چکیده

acoustic signal which simply represents the sequence of the actually pronounced phones. This might in turn reduce recognition accuracy. One solution to this problem is to use "phonological rules" ([2], for both French and English, and [3]) which account for various pronunciation variations in the language. However, it is hard to obtain reliable sets of such rules. Because of the compactness of such rules, many variation phenomena cannot be covered. In this study, we develop data-based word juncture models, which account for the pronunciation variations at word boundaries, as an optional form of phonological rules. We used the American English TIMIT database. Issues in generating the models and using them in a continuous recognition task are discussed. A comparison is given between the coverage of the pronunciation variations by the models and by a set of phonological rules. There is a fairly good agreement between the models and the rules in predicting the pronunciation variations, whereas the models cover a larger set of variation phenomena. Furthermore, use of the models improved recognition performance. In this study, we propose a different approach. We only concentrate on the pronunciation variations at word junctures, although generally within-word deviations may also occur. We directly generate word-juncture models based on the statistics of the pronunciation variations from a training data set. Then we use these models to predict those actual phone sequences deviating from the normative sequences. The actual phone sequences are used in word recognition. The whole process of generating and using the models is illustrated in Figure 1.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LONGER−LENGTH ACOUSTIC UNITS FOR CONTINUOUS SPEECH RECOGNITION (ThuAmPO1)

Recent research on the TIMIT database suggests that longer−length acoustic units are better suited for modelling pronunciation variation and long−term temporal dependencies in speech than traditional phoneme−length units, yielding substantial improvements in recognition accuracy [9]. In this paper, we investigate whether similar improvements can be gained on another database, viz. excerpts from...

متن کامل

IPA: improved phone modelling with recurrent neural networks

This paper describes phone modelling improvements t o the hybrid ronnectionist-hidden Markov model speech recognition system developed a t Cambridge University. These improvements are applied to phone recognition from the TIMIT task and word recognition from the Wall Street Journal (WSJ) task. A recurrent net is used to map acoustic vectors t o posterior probabilities of phone classes. The maxi...

متن کامل

Investigating modulation spectrogram features for deep neural network-based automatic speech recognition

Deep neural network (DNN) based acoustic modelling has been shown to yield significant improvements over Gaussian Mixture Models (GMM) for a variety of automatic speech recognition (ASR) tasks. In addition, it is also becoming popular to use rich speech representations, such as full-resolution spectrograms and perceptually motivated features, as input to the DNNs as they are less sensitive to t...

متن کامل

The Interplay of Perception and Production in Phonological Development: Beginnings of a Connectionist Model Trained on Real Speech

Three forward models are presented that map articulatory positions onto acoustic outputs for a single speaker of the MOCHA speech database. Backpropagation learning was used to train the forward models on a database of 460 TIMIT sentences. Efficacy of the trained models was assessed by subjecting the model outputs to speech intelligibility tests. The results of these tests showed that enough ph...

متن کامل

Phone recognition with hierarchical convolutional deep maxout networks

Deep convolutional neural networks (CNNs) have recently been shown to outperform fully connected deep neural networks (DNNs) both on low-resource and on large-scale speech tasks. Experiments indicate that convolutional networks can attain a 10–15 % relative improvement in the word error rate of large vocabulary recognition tasks over fully connected deep networks. Here, we explore some refineme...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 1997